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Abstract 

m , 

, It is shown that the length of the algorithmic minimal suffi- 

cient statistic of a binary string x, either in a representation of 

■ a finite set, computable semimeasure, or a computable function, 
. has a length larger than the computational depth of x, and can 

c/5 \ solve the Halting problem for all programs with length shorter 

than the m-depth of x. It is also shown that there are strings 
for which the algorithmic minimal sufficient statistics can contain 
a substantial amount of information that is not Halting informa- 
tion. The weak sufficient statistic is introduced, and it is shown 

■ that a minimal weak sufficient statistic for x is equivalent to a 
minimal typical model of x, and to the Halting problem for all 



strings shorter than the BB-depth. of x. 
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> : 

X 1 Introduction 

S' 

In statistics, a sufficient statistic relative to a parametrized family of 
probability distributions with some prior distribution for the parameter, 
is a function of the data that contains enough information to do some 
Bayesian inference of the parameter from the distribution that generated 
the data jSH^. 
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The definition of an algorithmic sufficient statistic of a string x was 
introduced as the absolute notion of a sufficient statistic from statisti- 
cal theory, it is without reference to a parametrized distribution and 
thus without a prior distribution [8]. The minimal algorithmic sufficient 
statistic is interpreted as "meaningful information" of x, [T4"|, IT], and the 
remaining information of x is interpreted as "noise". This interpreta- 
tion was applied to image de- noising jT] . The minimal sufficient statistic 
can be defined with a representation as either a finite set, a computable 
semimeasure, or a computable function. It is also related to the structure 
function, and therefore to inference methods such as minimum descrip- 
tion length, and Bayesian maximum likelihood induction [15j. It is shown 
that these induction methods perform induction in more or less the same 
way [T2] . 

In this paper, it is investigated whether "meaningful information" 
represented by a minimal sufficient statistic contains the same informa- 
tion as initial segment of the Halting sequence. It is shown in propo- 
sition 14.31 that the algorithmic minimal sufficient statistic of a string x 
can compute the Halting problem for all strings with length shorter than 
the m-depth. In Proposition 15.11 it is shown that the minimal sufficient 
statistic can also carry an amount of "noise" or information not related 
to the Halting problem. Weak sufficient statistics are introduced, which 
are within super logarithmic^] bounds in the length of x also sufficient 
statistics. A minimal weak sufficient statistic is constructed and it is 
shown that it is equivalent with some initial segment of the Halting se- 
quence in Proposition 16.71 Finally typical models are investigated and 
the equivalence of a minimal typical model and minimal weak sufficient 
statistic is shown within constant bounds in Proposition 17.61 

The minimal sufficient statistic is not computable, and can therefore 
not be directly implemented by any practical computer. However, they 
can be approximated with data-compressors. In this respect many enu- 
merable or limit-computable functions in the theory represent flexible 
place-holders for programs that can be reused, or for programs present- 
ing improving solutions for a task through time. A theory that tries to 
interpret algorithms like in [TJ, suffices to be accurate within logarithmic 
bounds. However, if one wants to know whether there is a correspon- 
dence with Halting information, the theory needs to be developed in full 
detail. 

lr The super-logarithm is the inverse of the tetration function. The tetration func- 
tion is given defined by the sequence: 1, 2, 2 2 , 2^ 2 >, ... 
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2 Definitions and notation 



All result here are given in the length conditional setting. This allows to 
reduce technical details for some results. This choice is also justified by 
the observation that in most applications of statistics or machine learning 
algorithms the size of the available data is predefined or contains no 
relevant information related to the problem. 

For excellent introductions to Kolmogorov complexity we refer to P22 
[TT] . Let ui be the set of natural numbers and let for any set S, S <UJ , S n be 
the set of sequences of elements of 5* of finite length, and of length n. Let 
2 <UJ , 2 n be the sets of the finite binary sequences and binary sequences of 
length n. The natural association 

2 <w -f uj : e -> 0, -> 1, 1 -> 2, 00 -> 3, ... 

is implicitly used were needed. For any x G 2 <w , x l = X\...Xi. 
An interpreter $ is a partial computable function: 

$ : uj x 2 <u x uj <u -> uj <u : i,p,a; -> $ t (p|ar). 

and = limt-KX) $t(p|x). The use of uj <uj in this definition is to 

allow $ to have multiple inputs and outputs in uj or 2 <UJ . An interpreter 
is prefix-free if for any x, the set D x of all p where <&(p\x) is defined, is 
prefix-free. Let $ be some fixed optimal universal prefix-free interpreter. 

For n G uj, and x, y G c<j <w , the Kolmogorov complexity K(x\y), is 
defined as: 

K t (x\y) = mm{l(p) : J= x} 

K t (x) = K t (x\e). 

K(x\y) and K(x) are obtained by taking the limit in t. Remark that in 
the definition of K, the parameter n is always implicitly assumed to be 
available for $. For functions /, g, the notation / ^ + g and / = + g is 
used for / ^ g + 0(1) and / = g ± 0(1). With abuse of notation let 
logx = [logx] = l(x). Remark that given log be decrypted by 

its binary representation, and therefore: K(x\ log a?) ^ + logs. 
Prefix-free Kolmogorov complexity is additive: 

K(x,y)= + K(x) + K(y\x*), (1) 

where x* is a program of length K(x) that outputs x. 
For x,y G uj <uj , n G uj, 

x — > y 

means that there is a program p x with l(p x ) ^ 0(1), such that &(p x \y, n) {= 
x. Remark that $ is also conditioned to n. Also remark that if x — > y, 
than K(x) ^+ K(y). 
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Lemma 2.1. For any w,p G u with $(j») |= w and l(p) ^ + K(w) we 
have 

w* — > p. (2) 

This is shown in [3] , or follows from combining the results of [Til Exercise 
...]. 

The complexity of a finite set S is the minimal length of a program 
on $ that enumerates all elements of 5* and halts. The complexity of a 
computable function / is: min{Z(p) : \/x[<&(p\x) |= f(x)]}. 

A length conditional semimeasure is a positive function P such that 
for every n: 

J2i p ( x ) -xe2 n }^l. 

From now on, only length conditional semimeasures are used, and re- 
ferred as semimeasures. A semimeasure P multiplicatively dominates 
a semimeasure Q, notation P ^* Q if there is a constant c such that 
cP ^ Q. P =* Q means P ^* Q and Q ^* P. A semimeasure P 
is universal in a set S of semimeasures if P G S and P dominates all 
semimeasures in S. Let m be a universal semimeasure in the set of enu- 
merable semimeasures. By the coding theorem it satisfies: 

— logm(x) = + K(x). 

3 m- depth 

m-depth is studied in detail in where it is defined depending on the 
choice of the universal semimeasure. It is also used in [1], where its 
logarithm appeared as a bound for an on-line coding result, m-depth 
can be interpreted as an alternate notion of "sophistication" of a string 
[Tj. Here it suffices to use m-depth for the specific choice of m t (x) = 
J2{2 l(p) ■ $(p) G 2 n }. This m-depth was introduced in P [TO] without 
being named. It is shown that it dominates Buzzy Beaver depth and 
coarse sophistication [21 E], however it is more unstable in the sense that 
the m-depth of a binary string can vary unboundedly for small changes 
of the constants in its definition [3]. 

Definition 3.1. The computational m-depth k x of x G 2" is given by: 

Of = ^{2-' (p) : $ t (p) G 2 n } 

fit = limO? 

t fc = min{t : tt n - <: 2~ fc } 

k x = mm{k:K tk (x)= + K(x)}. (3) 
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Let be the first j bits of the binary expansion of fi™. The Halting 
sequence H, is given by: 



H? 



1 if $(i|n) | 
otherwise. 



According to Lemma l3~2| some initial segment of H and Vt n carry the 
same information, and the binary expansion of Q n is incompressible. 

Lemma 3.2. For j ^ n: Q n ^ — > ana > K(ft n ' j \n) ^+ j. 

The proof is identical as in [TTj, Claims 3.6.1, 3.6.2], and is repeated 
using m-depth. 

Proof. Remark that fl n ' J — ► tj, by searching for the smallest t with 
Vt™' 3 ^ Q n ' J . Any halting program of length shorter than j — 0(1), defines 
some program that outputs a string in 2 n , and therefore contributes at 
least 2~ J to tt n,J . By definition of tj, any program of length less than 
j — 0(1) that halts, must have computation time below tj. Consequently, 

o"" —>/,.., .//^ ; " . 

If K(Q n ' J ) ^ j — c, for c large enough, the corresponding program gener- 
ating Q n ' 3 , can be turned into a halting program with computation time 
larger than tj, contradicting the previous paragraph. □ 

Let K H (x\y) be the Kolmogorov complexity relative to Q H , it is $ 
with an oracle that contains the Halting sequence H and let I(x; H) = 
K H (x)-K(x). 



Proposition 3.3. 

and 



K. (*£) j kx ^ ^ ' 5 



K(x\n) =+ k x + K(x\Q n ' k *,n) ±21ogfc x 
I(x;H) k x -2\ogk x . 

Proof. First an alternate characterization of m-depth is given. Let pi,p2, ■■ 
be an enumeration of all Halting programs ordered by Halting time, it is, 
for all t if j < i and &t(Pj) l> than &t(Pi) I- Let p be a halting program, 
and let i such that p = Pi, than let 



Ctp 



J> ; " : 1 ^ j < *}■ 
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Let (3 P be the first l(p) bits of a p in its binary expansion. Remark that 
the the set of all (3 P for all halting programs p is prefix-free, and 

p < — ► Pp. 

Let 7 P be the largest prefix (3 l of (3 P such that Q n — (3 p $C 2~ l . Let x* 
be the first program in the enumeration pi,P2, ■■■ with $(a;*) |= a; and 
l(x*) = + K(x) with the same constant as implicit in equation (151) . Let s 
be the computation time of x*. Remark that tk x -i ^ s ^tk x . Therefore 

Q n - 2' kx+1 < a x . < fi n - 2~ k *, 

and it follows that 

7(B . ( 4 ) 

This shows that 

Which shows the first claim of Proposition 13.31 

Remark that the set of all p such that ^ x * is a prefix of j3 p is prefix- 
free. Therefore, given 7^*, the remaining l(x*) — k x bits of (3 X * define a 
halting program for x given 7^* . Consequently, 

iT(x|fi n ' fca; ) Z(ar*) - £^ = + K(x) - k x . 

Remark that K({l n ' kx ) ^ + k x + 21og&a;. Therefore: 

K{x\Q n ' k ) ^ + K{x\{Q n ' k )*) 

=+ AT(x, Q n ' k ) - K{Q n ' k ) 

=+ K(x,k)-k x -2\ogk x 
^+ -21og^. 

This shows the second claim of Proposition 13.31 

It remains to show the last claim. Remark that k, H — > Q n,k . 

K H (x) ^+ K H (x\k x ) + 2\ogk x 

=+ K H (x\k x ,Q n ' k ) + 2\ogk x 
^+ K(x\k x ,Q n > k ) + 2logk x 
^+ K(x)-k x + 2\ogk x . 

Therefore, 

I(x; H) = K{x) - K H (x) > + k x -2\ogk x . 

□ 

In the proof of Proposition 14.31 it will be shown that the log /c x -terms 
are necessary. The construction of an explicit weak sufficient statistic in 
Section [6] can be considered as an exact variant of this Proposition. 
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4 Algorithmic sufficient statistics 



The algorithmic minimal set sufficient statistic was introduced in [S] . The 
probabilistic and function variants are introduced in [TJj. For technical 
reasons the length conditional variants are used here. 

Definition 4.1. • A finite set S is a sufficient set statistic of a binary 
string x iff x £ S and 

K(S) + \og\S\ =+ K(x). (5) 

• A computable semimeasure P is a sufficient probabilistic statistic 
of a binary string x iff 

K(P)- log P(x) =+ K{x). 

• A computable prefix-fre^l function F : uj — > uj is a sufficient func- 
tion statistic of a binary string x iff for some d e 

K{F) + l{d) =+ 

For Z = S, P, F, a. minimal sufficient statistic Z x is the sufficient statistic 
Z such that K(Z) is minimal within a constant. Let = K(Z X ). 

For Z = S,P, F, let 1 1 log Z| | be either: log \S\, — \ogP(x) or min{/(<i) : 
d G The definitions of a sufficient statistic (SS) are summarized 

by: 

K(Z) + || log Z| | = + K(x). 

Proposition 4.2. Every probabilistic SS of x generates a functional SS 
of x. Every functional SS of x generates a probabilistic SS of x. 

Proof. The first claim of the proof is solved by applying Shannon Fano 
coding [UJ. Suppose P is a SS of x, than let for any y: 

a v = $^{ p (X> : z < y}> 

and let (3 y be the first — log P(y) bits of a y . Let 

F : u -> : /3 y -> y. 

Remark that F is computable, injective and prefix-free. If d is the inverse 
of x, than Z(d) = — logP(x), therefore, F is a SS. 

2 Remark that, it is required here that F is prefix-free, as in contrast with [14] . 
If F was not required to be prefix-free, than it follows that there are strings with 
K(F) + l(d) «C+ K"(a:)-log7i. 
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The second claim of the proposition is now shown. Suppose F is a 
SS of x, than let for any y: 



_ |max{2- z W- 1 : F(d) = y} if 3d < 2y[F(d) = y] 
1 4^2 otherwise. 

Remark that P is computable and that P is a semimeasure: 
J2 P (V) < V4^1A 2 + 1/2^{2'W :dedomF}< 1. 



Idu) 



□ 

Remark that — ► (S^l — ► log | S^. Let S 1 * be the shortest pro- 
gram that enumerates S and halts, and let i be the index of x in this 
enumeration. A prefix-free encoding of x using S*,i requires K(S X ) + 
log \S X \ + 0(1). Therefore, if % is the index of x in that enumeration, than 



using Proposition 13 . 31 : 

S*, i, k x — > x, K(x), k x — > (6) 

where is the witness of if(ar) and, Q n,kx are the first fc x bits of VL n,kx . 
The question rises whether 



and if not, how do these differ ? An analogue argument holds for the 
probabilistic and the function case. 

Proposition 4.3. For allx: 

l s x k x -2\ogk x . 

Proof. Let s be the computation time of the program S*, the shortest 
program of length K(S x \n) that computes S from n. Let / be a large 
enough computable function such that using equation (jSJ) it follows that: 
Kfi s \{x) ^ + K(x), and therefore s ^ tk x -x- This shows that 

S x -> k>x > fi s ): kx * tkx—ii k x ► ^ ' * ► ^ ' !B - 
By Lemma E21 K(£l n ' kx ) ^ + fc 2 and therefore, l x ^ + k x - 2 \ogk x . □ 
Lemma 4.4. For a// x G 2™ uwt/i K(x) = l(x)/2: 

I'x > + I'x 

ll > + It 
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In [13] it is shown that every set SS generates a probabilistic SS and 
every probabilistic SS generates a function SS. Below it will be shown 
that every function SS generates a set SS if K(x) can be computed from 
n. Since a function SS generates a probabilistic SS, this finishes the proof. 

Proof. Given F x , the set 

S F = {./(//) ://C- T* ">} 
contains x, and has K(S F ) ^ + l F . Moreover 

log|^ F | «C+ n/2-l F < + K{x) -K(S F ). 
Therefore, l F >l s x . □ 



5 A minimal sufficient statistic can carry 
non-Halting information 

Proposition 15.11 shows that the minimal sufficient statistic can carry a 
substantial amount of information that is not Halting information. 

Proposition 5.1. For Z = S, P, F : 

Vc3°°x [Zf ^+ {k x ) c A I(x; H) ^+ fcj . 

3u > 03°°x [Zf ^ + i/Z(x) + k x A /(x; i?) ^ + k x ] . 

First a sketch of the proof of the Proposition is given. Let x* be a 
program of length K(x) that produces x. If Z is a SS, than it will be 
shown that 

x* — ► Z,K(Z). 

This means that a shortest program for x generates K(Z). If Z where 
equivalent with Q n ' 1 for some i, than i can be computed from x, K(x). 
However, an x will be constructed such that x* has a computational m- 
depth of i, but i has a high complexity given a;*. This shows that x* does 
not compute i, and that there can be no SS Z of length i. Since i has large 
complexity given x*, also numbers close to i have large complexity given 
x*. This will allow to derive lower bounds for the minimal sufficient 
statistic relative to the m-depth. Before the proof Proposition 15.11 is 
given, Lemmas I5.2H5.7I are proved. 

Lemma 5.2. Let x G 2 n , and i ^ n/2 such that 

K(x\i*) = + n 
Xi = I. 
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There is an y G 2 n / 2 such that: 



x % = 


yl 


y < 


x n/2 


K(y) =+ 


n/2 


K(i\y) = + 


K({ 


i(y,H) ^+ 


i 



Proof. Applying additivity of prefix-free Kolmogorov complexity, equa- 
tion (P): 

K{x l \i*) = + K{x\i*) - K(x L .. n \(x l )*,i*) 
^ + n — (n — i) 

and therefore: K(x l \i*) = + i. 
Choose v G such that 

K H {y\x\i*) ^n/2-z-l. 

Such t> always exists. Let y = x l 0v. Obviously, the first two conditions 
of the Lemma are satisfied. 

Since K{x l ) = + v. x l < — > (x 1 )*. Applying additivity of prefix-free Kol- 
mogorov complexity: 

K(y\i*) =+ K^f) + K(v\x\i*) 
=+ i + n/2-i-l 
=+ n/2. 

Therefore, also the third condition is satisfied. 
Remark that y < > y* such that: 

K(i\y,n) =+ K(i,y)-K(y) 

=+ K(y\i) + K(i)-K(y) 

^+ n/2 + K(i) - n/2 

= K(i). 

Therefore, also the forth condition is satisfied. 
Remark that: 

K H (y) ^+ K H (v)^+ n/2-i 

K{y) ^+ n/2. 



Therefore, also the fifth condition is satisfied. 

10 



□ 



Lemmas 15.31 and 15.41 show that if i can not be computed from x, than 
also numbers in some neighbourhood can not be computed from x. Let 
log^ % be the k-th iteration log ... logz. 

Lemma 5.3. Let c be constant, if 

K(i\x) ^ + log % + log*' 2 -' % + log 

than 



mm{K{j\x) : i l/c < j ^ i c } ^ log (3) i - 0(\og 



M) 



Proof. The proof of the conditioned version on x is the same as the 
unconditioned version, which will be shown here. 

K(i) =+ K(i,logi,log (2) z,log (3) i) 

=+ ^|(logz)*,(log( 2 )z)*,(log( 3 ^)*) 
+ir(logz|(log (2) z)*,(log (3) i)*) 
+K(log (2) z|(log (3) z)*) 

+K(\og {3 U). (7) 

Since K(w\ logw) ^ + logw and K(w) ^ + 21ogw;, we have that 

K(\og {2) i) ir(log (2) 2|(log (3) 0*) 

=+ K(i) -K(i\(\ogiy,(\ogWi)*,(\ogMiy) 

-K(logi|(log (2) i)*, (log (3) z)*) - K(log (3) i). 
log (3) i-0(log (4) z). 

Remark that: 

log^ i = + log(l/clogz) ^ log 2 j ^ log(clogi) = + log^z. 



therefore, 



K{j) ^+ K(\og^j) ^ + log (3 ) 



i. 



□ 



Lemma 5.4. For any c, let i be the c most significant bits of i. If 
i(l - 2- c ) ^ j ^ z'(l + 2- c ), than K(j\n) ^+ K(i\n). 

Proof. Trivial. □ 

In the proof of Proposition 15.11 an i will be needed that both satisfies 
the conditions of Lemmas 15.21 and 15.31 Lemmas 15.51 15.61 and 15.71 show 
that such i can be constructed. 
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Lemma 5.5. For any x with K(x) ^ + n, there is at least one i satis- 
fying both conditions of Lemmas 15. £1 and \5.3\ and there is at least one i 
satisfying both conditions of Lemmas \5. 6 A and \5.cA 

Proof. The first claim of the proposition implies the second claim, which 
is shown here. Remark that since K is implicitly conditioned on n, it 
follows for logi ^ (logn)/2 and % ^ n that 

K(i) ^+ K{i\\ogi) 

^ + log? + 2 log(logn — log?) 
^ + logn 

This shows that for every % ^ n, there is a p G 2 <logn+ °W, such that 
$(p|n) |= i. Therefore, if K(x) ^ + n, than by Lemma 15.61 for any v 
there are maximally vn different % such that K(x\i*) ^ + n. By Lemma 
15.71 there are also only n/8 different % ^ n/2 such that the condition of 
Lemma [5.31 is not satisfied. Finally, there are maximally n/4 + 2 logn + 
0(1) different i such that Xi — 0, since otherwise x could be compressed. 
This shows that there are maximally 

vn + n/4 + 2 logn + n/8 + 0(1) 

many i ^ n/2 that not satisfy the conditions of Lemmas 15.21 and 15.31 
Therefore, for v sufficiently small, there must be at least one i satisfying 
the conditions of Lemmas 15.21 and 15.31 □ 

Lemma 5.6. Let v > 0, and 

S x , c = {pe 2 <log "+°« : K(x\p) < n - c}. 
There is a c such that for any x with K(x) ^ + n: 

\S x ,c\ ^ vn - 

Proof. Let 

U x ,c = {(p,q):qe 2 <n ~ c Ape 2 <n+ °^ A ${q\p) |= x}. 
Suppose that 

Wc3x[\S X)C \ > i/n], (8) 

than, 

Vc3x[\U x J ^ i/n], 

Let 

P{x) = \U x>c \n2~ n . 
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Remark that U X}C is enumerable, and therefore P(x) can be enumerated 
from x, c. Applying the coding theorem shows that: 

K(x) ^+ -\og P(x) + K(P) 

^ + log v + n — c + 2 logc. 

Since K(x) ^ + n, this shows that 

c — 2 logc ^ + v. 

Which contradicts the generality of c from equation (JSJ). □ 
Lemma 5.7. There are manyi < | satisfying the condition of Lemma 

Proof. Let log*- -* i = 1 Let ^ j ; ^ c = d — 1 = 3, there are maximally 
n2~ c _1 many i < n/2 that not satisfy: 

K(log (i) i|(log (i+1) i)*, (log (c) i)*,n) ^ log {j+1) z - c'. 

Therefore, maximally (c + l)2~ c '~ 1 n = | many i < | do not satisfy the 
above equation for some j = 0, c. The decomposition in equation (J7J) 
finishes the proof. □ 



Proof of proposition 15.11 Let m t be an enumeration of the universal 
enumerable semimeasure m, such that that for all t there is maximally 
one x G 2 n with 

m t (x) ^ m t+ i(x). 

Additionally assume that for all k < 2 n//2 , for witch there is a t such that 

Y^{m t -i{x) :xE2 n }< k2' n/2 ^ ^{m t {x) : x G 2 n }, (9) 

there is a Zk G 2 n , such that m t (zk) ^ 2~™ and 771^+1 (^fe) > 2~ n . Remark 
that for any such k 

Zk < > k. 

Remark that equation ([9]) is very similar to the requirement fi" < k2 n l 2 ^ 
Q". However, to reduce technical details, this equivalent formulation of 
the proof was preferred. 

By Lemma 13.21 one has K(Q n ' n \n) ^ + n. For n large enough, let 
y g 2™/ 2 as in Lemma [5.21 with x = Q n,n , and i chosen such that 

K(x) =+ n, ^ 

and K(i\x) large enough such that it will satisfy some upper bounds 
determined later in the proof. Remark that Xi — — 1 and yi = 0, and 
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therefore Vt n,% ~ 1 ^ y < Vt n ' % . This shows that y determines a k ^ 2 n//2 
such that equation (JH]) is satisfied, and since ft 71 ' 1 " 1 ^ ^{m t (x) : x G 2 n }, 
the corresponding t satisfies t ^ Let 2; = Zk- Remark that 

z < — ► y. 

This implies that K(z) ^ + n/2. , therefore i ^ + fc^. At time £j, one 
has m t (z) ^ 2 _n , and by a time-bound version of the coding theorem, 
Kt,_ ow {z) ^ + n. By Lemma l5.2[ I(y;H) ^ + i ^ + k z . Therefore, z 
satisfies the right condition of both claims of Proposition 15.11 

Let Z be a minimal SS. xhx Since w can be computed by first com- 
puting Z, and than the corresponding information of ||logZ||, and the 
total code to do this is shorter than K(z) + 0(1), it follows by Lemma 
IO that: 

**— >Z— >jf. (11) 

Now the left condition of the first claim of Proposition 15.11 is shown. 
Choose in addition of the requirements mentioned in ffTUl) the upper 
bound for K{i) of Proposition 15.31 

K(i\z) = + K(i) ^ + logi + log {2) i + log (3) %. 

Such % exists by Lemma 15.51 Lemma 15.31 shows that for any j with 

i 1/c > j > i c - 

K{j\z) > log (3) i-0(log {4) ), 
and therefore, assuming log (3) i > 0(1) one has: 

Z*y^j. (12) 

Combined with equation (TlTi) . this shows that either Zf < i l / c or either 
Zf > i c . By Proposition 14.31 it follows that 

l x > + k x - 2 log k x ^ + i - 2 logi, 

and therefore, If > i c . This shows the left condition of the first claim of 
Proposition 15. 11 

Now the left condition of the second claim is shown. Let c > 0(1), and 
choose for some 2 C ~ 1 < i 4, 2 C , such that K(i\x) > c. Let i = i2 lo s n ~ c -\ 
Remark that % = 0(n). By Lemma l5\4"t for — 2~ c ) $C j ^ i(l + 2~ c ) we 
have i^(j|x) ^ + c and therefore equation (Tl2|) holds. The same reasoning 
as in the previous paragraph shows the left condition of the second claim 
of Proposition 15.11 □ 

Proposition 15.11 shows that there can be a difference between the 
minimal SS and the information carried in the initial bits of the Halt- 
ing sequence. However, the proposition does not address the question 
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whether this difference is substantial with respect to an attempt to in- 
terpret algorithms that were designed inspired by the use of minimal SS. 
The first claim of Proposition 15.11 can only be satisfied for n sufficiently 
large, compared to the 0(1) constants. To obtain equation ([12]) it is 
assumed that log*- 3 -' i ^ 0(1), therefore, 



Even if it is assumed that the arbitrary constants are very low, suppose 
that 0(1) = 4 could be chosen in the above equation, the corresponding 
n is much larger than the length of any data that can possibly be the 
input of an algorithm. In the proof of the second equation of Proposition 
15.11 the constructed v satisfies v ^ 2 _c , which implies that for large c the 
largest fraction of the information of the minimal SS of the constructed z 
in the proof is Halting information. Therefore the result in this paper is 
only a partial result addressing the possible interpretation of the minimal 
SS as containing Halting information. 



6 Weak sufficient statistics 

A variant of the definition of a SS is proposed: the weak sufficient statistic 
(WSS). A criterion is provided for which the WSS is It is defined such 
that the minimal WSS is equivalent with an initial segment of the Halting 
sequence relative to a plain Turing machine. An explicit construction 
will be given to convert an initial segment of the Halting sequence into a 
minimal WSS and to convert a minimal WSS into an initial segment of 
the Halting sequence. 

The reason why a minimal SS, as defined higher is not equivalent 
with an initial segment of the Halting sequence, is that the length of that 
segment carries information that would be available in the description of 
x, while this information does not contribute to the compression of x. 
If the minimal SS is encoded such that the information of the length 
of the minimal SS does not "count", than there is an equivalence. It 
turns out that this is possible by conditioning the complexities of x, Z on 
C(Z) in the definition of a SS, where C(Z) is the Kolmogorov complexity 
with respect to a plain Turing machine. Let \l/ a plain Turing Machine, 
than C(x) = min{/(p) : ty(p,n) |= x}. The following equation relates 
prefix-free and plain Kolmogorov complexity [TTj : 

C(x) = + K(x\C{x)) (13) 

Definition 6.1. Let x G 2™. 
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• A finite set S C 2 n is a weak sufficient set statistic of a binary string 
x iS x E S and 

C(S) + log|S|= + K(x|C(S)). (14) 

• A computable semimeasure P over 2™ is a weak sufficient proba- 
bilistic statistic of a binary string x iff 

C(P) -log P(x) = + K(x\C(P)). (15) 

• A total function F : 2 <n — > 2" is a weak sufficient function statistic 
of a binary string x iff 

C{F) - logP(x) = + K(x|C(F)). 

For Z = S,P, the minimal weak sufficient statistic Z' x is the weak suffi- 
cient statistic Z such that C(Z) is minimal within some constant. Let 
l' x Z = C(Z' x ). 

In the same way as in Lemma I4.2[ for any x, a probabilistic weak 
sufficient statistic (probabilistic WSS) is algorithmically equivalent with 
an function WSS. 

Let || log Z| | be either log \S\, — logP(x), or mm{l(d) : F(d) = x. 
Then de defining equation for a WSS is given by: 

C(Z) + ||logZ|| =+ K(x\C(Z)). 

By Lemma 12.11 it follows that there are only a finite amount of SS'es. 
By Proposition 16.21 there can be an large amount of WSS'es for a string 
x. 

Proposition 6.2. If K(x) ^ + n, than x has 0(n) different WSS'es. 

Proof. Let i such that K(x\i) = + n. By Lemma [5.61 there are 0(n) such 
i. In the same way as the beginning of the proof of Lemma l5~2| it follows 
that K(x*\i) = + i. Let 

Si = {x l v : v G 2 n ~ 4 }. 

Remark that K(Si\i) = + K(x l \i) = + i and thus by equation f|T3|) C(Si) = H 
i. Also remark that log |Si| = n — i. This shows that Si satisfies equation 

(ffH). □ 

Proposition 6.3. For Z = S,P, F, if Z is a SS of x G 2 n , and 

Z,K{Z)^C(Z), 

than Z is a WSS of x . 
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Proof. Remark that by equation (fT5I) every WSS Z defines a shortest 
description of x given C(Z) on a prefix-free Turing machine. By the 
conditioned version of Lemma 12.11 it follows that 

x,K(x\C(Z)),C(Z) — ► Z. 

By the assumption of the proposition 

Z* -> C(Z). 

One also has K(x) = K(x, K(x)), and its conditioned equivalent. There- 
fore: 

K{x\C{Z)) =+ K{x,K(x\C{Z))\C(Z)) 
=+ K(x,Z\C(Z)) 
=+ K{x\Z*,C(Z)) + K{Z\C{Z)) 
=+ K{x\Z*) + K{Z\C{Z)) 
=+ K{x) - K{Z) + K{Z\C{Z)) 

|| logZ|| =+ tf(a;) - K(Z) =+ ^(x|C(Z)) - C(Z). 

□ 

The question raises whether Z, K(Z) — > C(Z). Let fc 2 be the tetra- 
tion with base 2 and height k, it is the k-th iteration of taking the power 
of 2, it is: 

The inverse of the tetration function is the super-logarithm, it is 

slog x = max{A; : k 2 ^ x}. 

Lemma 6.4. 

K(C(x)\x,K(x)) ^+ O(slogx). 
Proof. C(x) is approximated as: 
fci = K(x) 

k 2 = K\x\kl) = K(x\K(x)*) 

k 3 = K{x\k{) = K(x\K{x\K(x)*)*) 

hi = Kixlk*^) = K(x\K(x\...*Y). 

Remark that since k\ ^ + 2 log a;, it follows that k\ — k 2 ^ + 2\og^ x. 
Suppose that 

abs (jfei_! - h) ^ + 2\og (i) x, 
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than it follows that 

abs {ki — k i+ i) ^ + abs (K(x\ki) — K(x\k i+ i)) 
^ + 2 log abs (ki — k i+ i) 
^+ 21og (m) x. 

and therefore the series has converged after slog x steps, within a con- 
stant. The limit of the series is some k for which K(x\k) = + k. There 
is only one value k that for some x satisfies K(x\k) = + k. Since if there 
was also a I < k such that K(x\l) = + I, than 

k-l=+ K(x\k) - K(x\l) 21og(A; - /), 

and therefore, k = + I. Remark that the proof of equation (Tl3|) . see [TT| 
Lemma 3.1.1] also shows that 

C(x) =+ K(x\C(x)*). 

Therefore, it follows that this series ki converges to C(x). To prove the 
proposition, it suffices to show that the evaluation of fcj+i, K(x, ki + i) from 
),x requires at most a constant amount of bits. First remark 
that for any u, v [3]: 

K(u,v) =+ K(K(u\v*),u,v). 

Since there are maximally a constant amount of programs of length 
K(u,v), that produce u,v, K(u\v*) can be found within 0(1) bits from 
u,v, K(u,v). Replacing u = x and v = ki, shows that k i+ i can be com- 
puted from x, ki, K(x, ki). In a similar way, it is shown that K(x, ki + \) 
can be computed from K(x, ki, fcj+i). Therefore, fci+i, K(x, ki+i) can be 
computed from ki, K(x, ki). □ 



By Lemma RT4l and Proposition ^. 31 it can be stated that for strings of 
realistic length, every WSS is a SS. This is why the name weak sufficient 
statistic was chosen. It contrasts with the name strong sufficient statistic 
defined in [T3] . 

An explicit construction of a probabilistic WSS P' x for an x £ 2 n is 
now given. Remark that in [8] a construction is given of what is called an 
"Explicit minimal near-sufficient statistic". The construction there can 
be adapted to a construction of a set WSS using the same ideas as as the 
construction of P' x . The construction of P' x makes use of k' x , a variation of 
m-depth, which will be called 55-depth since it uses the Buzzy Beaver 
function. Assume c be large enough: 

BB(k) = max{^(p) : p £ 2 k } 

k' x = min{fc : K B B(k)(x\k) = + K(x\k)} 

■ 2 -K BB[k , x){ yW x)+k > x ^ if KBB{K){y \ K) ^ K BB{k ^ x) ( y \k' x ), 

otherwise. 



p'M 
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Proposition 6.5. P' x is a probabilistic WSS. 

Let k x \i be the conditional m-depth, it is the depth according defini- 
tion l3.1l with the semimeasure m replaced by the conditional semimeasure 
m(.\l). Then the following Lemma shows a relation between conditional 
m-depth and PP-depth, which is similar to equation (fT3l) . 



Lemma 6.6. 



k' x — + k x \ k / x 



Proof. It suffices to show that: 

BB{k) ^ *fc+o(i)|fc 

t k \ k ^ BB(k + 0(l)) 

The first inequality follows by remarking that any program of length 
k halting on a plain Turing machine, can be adapted to a program of 
length k + 0(1) by adding a constant amount of instructions, halting on 
a prefix-free Turing machine given k. 

The second inequality follows by remarking that Q™ k ,k , the conditional 
version of fl n ' h , defines a Halting program on plain Turing machine that 
outputs tk\k by adding a finite amount of instructions. □ 

Proof of Proposition 1 6. 51 First it will be shown that P' x is a semimea- 
sure. Let 

m t {y\l) = 2-**W) 
For d large enough, by Lemma 16.61 

y^ y rn B B(k){y\k) - m B B(k-i)(y\k) 
y 



u 



Choosing c in the definition of P' x large enough, shows that P' x is a 
semimeasure. 

Now it remains to show that P' x satisfies the defining equation (TT5l) 
of a probabilistic WSS. Remark that given C(P), a program for P on a 
plain Turing machine can be turned into a program for P given C(P) 
on a prefix-free Turing machine by adding a constant amount of instruc- 
tions. Using Shannon- Fano code [TT], this shows that C(P)— log P(x) ^ + 
if(x|C(P)). By the choice of m, one also has that 

P' x {x) = m BB{k) {x\k)-m B B^-i){x\k) > l/2m BB{k) (x\k) = 2 - K ^+ k '^-\ 
This shows the ^ + inequality of equation fTT5l) . □ 
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Proposition 6.7. 

P' x X^BB{k' x )X 



Before the proof is given, first some technical result is written out: 
Lemma 6.8. For any t ^ BB[k — 0(1)): 

t,k — > BB(k). 

Proof. Let c, such that t ^ BB(k — c). Remark that t, k — > BB(k — c). 
Let 

S k = {pe2 k :*(jp)>BB{k-c)}. 

Suppose that \Sk\ > f(k), with / unbounded. Let Pi,P2, ■■■ an enumera- 
tion of all binary strings in 2 fc , ordered with increasing computation time 
on Remark that I < — > p\. Given BBik — c), k the set appears at 
the end of this enumeration. Therefore, there is some element p\ G Sk, 
such that its index / ends with \ogf(k) — 1 zeros. / has plain complexity 
below k — log f(k) + 2 log log f{k). Therefore, / can be transformed into a 
program that has an output above BB{k — c), and has length unbound- 
edly below k — c, which contradicts the definition of BBik — c). □ 



Proof of Proposition 6/1 The left < — follows from the definition of 
P' x . It remains to show the right — >. By Lemma |6.8[ it suffices to show 
that P x , k' x — > t, k' x with t ^ BB(k' x - 0(1)). Let z be the lexicographic 
first string with mBB(k> x -c)(y\k x ) ^ 2~ n , for some constant c large enough, 
than it follows that 

m B B(k' x )(y\k x ) ^ + k' x + 2logk' x . 

Therefore, by estimating BB(k' x ), BB(k' x — 1) on ^ t for increasing t, and 
using k' x , one can only find an equality for P' x {z), for t ^ BB{k' x — c). 
Therefore, P x , k' x — > t, k x . □ 



7 Minimal typical model 

Typical set models were studied in [12], and it was shown that within 
logarithmic bounds, the complexity of the minimal typical set and the 
minimal SS where equal. It is shown here that a minimal typical model 
is equivalent with a minimal WSS and therefore, there complexities are 
equal within constant bounds. For either set, probabilistic and functional 
models. Therefore, the minimal typical model is also equivalent to some 
initial segment of the Halting sequence. 
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Definition 7.1. • Let S* denote the shortest program on a plain 
Turing machine. A finite set S is a typical set for a binary string x 
iff x G S and 

log | S | = + K(x\S*). 

• Let P* denote the shortest program on a plain Turing machine that 
computes P. A computable semimeasure P is a typical semimeasure 
for a binary string x iff 

-logP(x) = + K(x\P*). 

• Let .F* denote the shortest program on a plain Turing machine 
that computes F . A computable function F : uj — > u; is a typical 
function for a binary string x iff 

3d[F(d) = x A = + K(x|F*). 

For Z = S, P, F, a minimal typical model is a typical model Z such 
that K(Z) is minimal within a constant. 

The same proof of Proposition 14.21 also shows that the set of function 
typical models is the same as probabilistic typical models. Remark that 
in [T2], a set typical model is defined as log |*S f j = + K(x\S). In this 
definition S is replaced by its minimal description, with respect to a 
plain Turing machine. Since [12] only considers equalities of functions 
within logarithmic terms of n both in value and in argument. The results 
shown there, also remain valid using the definition above. By Lemma 
16.41 the results also hold within 0(slog ) terms, if Z* was the shortest 
representation on a prefix-free Turing machine. 

Proposition 7.2. Every WSS for x G 2 n is also a typical model (TM) 
for x G 2 n . 

Proof. Remind that for any WSS Z: 

x,C{Z),K{x\C(Z)) — ► Z. 

Therefore: 

K{x\Z*) =+ K(x\Z*,C(Z)) 

=+ K(x,Z\C(Z))-K(Z\C(Z)) 
=+ K(x\C(Z))-K(Z\C(Z)) 
= + Illegal, 

where || log Z\\ is either log \S\, — log P(x), or min{Z(d) : F(d) = x}. □ 
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By the same example as in [12], it follows that there are TM's that 
are not WSS'es. According to Proposition 17.31 P' x defines a minimal TM 
and by Corollary 17.61 a minimal WSS is equivalent with the minimal TM, 
which is equivalent with an initial segment of the Halting sequence. 

Proposition 7.3. If P is a probabilistic TM for x, than C(P) ^ + k' x . 

Before the proposition is proved, Lemma 17.41 is shown. 

Lemma 7.4. For some large enough computable function f : 

K t (x,y) ^ + K f (t,n)(x) + K f ( tjn) (y\x,K f ( tjn) ). 

This proof is essentially the same as additivity of prefix-free Kol- 
mogorov complexity [IT], but formulated with time-bounds. 

Proof. Let 

m t {x,y) = J2i 2 ' l(P) --^(p) i=[x,y}} 
S x = {p:$ t (p)l=[x,z)Azer}. 

Remark that S x can be enumerated from x, and by the coding theorem: 
K f{t , n) (x) ^ + -lo S ^2mt{x,z) =-log^{2-'W :pG S x }. 

z 

Therefore, 

P{z) = 2 K f^ ( - x) -° {l) m t (x,z) 

defines a conditional semimeasure that can be computed from x, Kft t n \ (x) 
in time t. Shannon Fano code shows that for / large enough: 

Kf(t,n)(y\x, K f ( t!n) (x)) ^ + K t (x,y) -K f ^ n) {x). 

□ 



Proof of Proposition \7.3\ Let P be a TM, than it will be shown that 

K(x\C(P)) =+ K BB{c{P)+OW) (x\C(P)). (16) 

If C(P) was unboundedly below k' x , this would contradict the definition 
of k' x . Therefore it remains to show equation (|16p . 

C(P)-\ogP(x) =+ C(P) +K(x\P*) 

=+ K(P\C(P))+K(x\P*,C(P)) (17) 
=+ K(x,P\C(P)) 

=+ K{x\C(P)) + K{P\x, K{x\C(P)), C(P)).(18) 
22 



On the other side, let s be the computation time to compute — log P{z) 
for all z G 2 n from P*. Than K s (x\P*) = + — logP(x). For computable 
functions /, g large enough we have by Lemma 17.41 

C{P) -logP(x) = + C(P) + K s (x\P*) 
^+ K g{s) (x,P\C(P)) 

^+ K f{s) (x\C(P)) + K m (P\x,K f(s) (x\C(P)),C(P)). 

Let A = Kf( a )(x\C(P)) — K(x\C(P)) ^ 0, than combining equations 
(HI and (USD: 

K(x|C(P)) + AT(P|x, K(x|C(P)), C(P)) 

^ + ir /(s) (x|C(P)) + ir /(s) (P|x,ir /(s) (x|C(P)),C(P)) 

^+ K(x|C(P)) + A + K(P\x,K(x\C(P)),C(P)) -2 log A. 

This shows that ^ + A — 2 log A, and therefore A = + 0. Since 
BB(C(P) + 0(1)) ^ s, equation (JTSJ) is satisfied. □ 

Corollary 7.5. P^ defines a minimal typical probabilistic model. 

Proof. Since C(P X ) is a WSS, it is also a TM, and since C(P' X ) = + k' x , 
there is no TM which is smaller by more than a constant. □ 

Let H' n be the Halting sequence relative to a plain Turing machine, 
conditioned. It is, H'- n = 1 if ty(i,ri) I, and H- n = otherwise. Corollary 
17.61 shows that a probabilistic minimal TM is equivalent with an initial 
segment of HL 

Corollary 7.6. If P is a minimal typical probabilistic model, and P* its 
minimal description on a plain Turing machine, than 

p* < — ► H ,n ' 2k ' x < — > (p' x y. 

Proof. Remark that by Corollary 17.5} we have that C(P) = k' x . From 
the proof of Proposition 17.31 equation ([TBft actually shows that if s is the 
maximal to evaluate a Shannon-Fano code according to P(y) for any 
y E 2 n , than: 

K(x\C(P)) =+ K s (x\C{P)), 

This shows that s ^ BB{k' x — 0(1)). Remark that s can be computed 
from P, therefore, 

s <: BB(C(P) + 0(1)) ^ BB{k' x + 0(1)). 

Let p be the program of length k' x with largest output, than 

Pi — > p < — >H' n > 2k ' x . 

The last < — > follows from Proposition 16.71 □ 
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